Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 98
Filtrar
1.
Bioinformatics ; 2024 Apr 22.
Artigo em Inglês | MEDLINE | ID: mdl-38648741

RESUMO

SUMMARY: SIMSApiper is a Nextflow pipeline that creates reliable, structure-informed MSAs of thousands of protein sequences in time-frames faster than standard structure-based alignment methods. Structural information can be provided by the user or collected by the pipeline from online resources. Parallelization with sequence identity based subsets can be activated to significantly speed up the alignment process. Finally, the number of gaps in the final alignment can be reduced by leveraging the position of conserved secondary structure elements. AVAILABILITY AND IMPLEMENTATION: The pipeline is implemented using Nextflow, Python3 and Bash. It is publicly available on github.com/Bio2Byte/simsapiper. SUPPLEMENTARY INFORMATION: All data is available on GitHub.

2.
Elife ; 122024 Feb 16.
Artigo em Inglês | MEDLINE | ID: mdl-38363283

RESUMO

The RNA recognition motif (RRM) is the most common RNA-binding protein domain identified in nature. However, RRM-containing proteins are only prevalent in eukaryotic phyla, in which they play central regulatory roles. Here, we engineered an orthogonal post-transcriptional control system of gene expression in the bacterium Escherichia coli with the mammalian RNA-binding protein Musashi-1, which is a stem cell marker with neurodevelopmental role that contains two canonical RRMs. In the circuit, Musashi-1 is regulated transcriptionally and works as an allosteric translation repressor thanks to a specific interaction with the N-terminal coding region of a messenger RNA and its structural plasticity to respond to fatty acids. We fully characterized the genetic system at the population and single-cell levels showing a significant fold change in reporter expression, and the underlying molecular mechanism by assessing the in vitro binding kinetics and in vivo functionality of a series of RNA mutants. The dynamic response of the system was well recapitulated by a bottom-up mathematical model. Moreover, we applied the post-transcriptional mechanism engineered with Musashi-1 to specifically regulate a gene within an operon, implement combinatorial regulation, and reduce protein expression noise. This work illustrates how RRM-based regulation can be adapted to simple organisms, thereby adding a new regulatory layer in prokaryotes for translation control.


Assuntos
Proteínas do Tecido Nervoso , Proteínas de Ligação a RNA , Animais , Proteínas do Tecido Nervoso/metabolismo , Proteínas de Ligação a RNA/metabolismo , RNA/metabolismo , RNA Mensageiro/metabolismo , Escherichia coli/genética , Escherichia coli/metabolismo , Mamíferos/genética
3.
NAR Genom Bioinform ; 6(1): lqae002, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38288375

RESUMO

The RNA recognition motif (RRM) is the most prevalent RNA binding domain in eukaryotes and is involved in most RNA metabolism processes. Single RRM domains have a limited RNA specificity and affinity and tend to be accompanied by other RNA binding domains, frequently additional RRMs that contribute to an avidity effect. Within multi-RRM proteins, the most common arrangement are tandem RRMs, with two domains connected by a variable linker. Despite their prevalence, little is known about the features that lead to specific arrangements, and especially the role of the connecting linker. In this work, we present a novel and robust way to investigate the relative domain orientation in multi-domain proteins using inter-domain vectors referenced to a stable secondary structure element. We apply this method to tandem RRM domains and cluster experimental tandem RRM structures according to their inter-domain and linker-domain contacts, and report how this correlates with their orientation. By extending our analysis to AlphaFold2 predicted structures, with particular attention to the inter-domain predicted aligned error, we identify new orientations not reported experimentally. Our analysis provides novel insights across a range of tandem RRM orientations that may help for the design of proteins with a specific RNA binding mode.

4.
J Mol Biol ; 436(4): 168444, 2024 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-38218366

RESUMO

Many examples are known of regions of intrinsically disordered proteins that fold into α-helices upon binding to their targets. These helical binding motifs (HBMs) can be partially helical also in the unbound state, and this so-called residual structure can affect binding affinity and kinetics. To investigate the underlying mechanisms governing the formation of residual helical structure, we assembled a dataset of experimental helix contents of 65 peptides containing HBM that fold-upon-binding. The average residual helicity is 17% and increases to 60% upon target binding. The helix contents of residual and target-bound structures do not correlate, however the relative location of helix elements in both states shows a strong overlap. Compared to the general disordered regions, HBMs are enriched in amino acids with high helix preference and these residues are typically involved in target binding, explaining the overlap in helix positions. In particular, we find that leucine residues and leucine motifs in HBMs are the major contributors to helix stabilization and target-binding. For the two model peptides, we show that substitution of leucine motifs to other hydrophobic residues (valine or isoleucine) leads to reduction of residual helicity, supporting the role of leucine as helix stabilizer. From the three hydrophobic residues only leucine can efficiently stabilize residual helical structure. We suggest that the high occurrence of leucine motifs and a general preference for leucine at binding interfaces in HBMs can be explained by its unique ability to stabilize helical elements.


Assuntos
Proteínas Intrinsicamente Desordenadas , Leucina , Proteínas Intrinsicamente Desordenadas/química , Leucina/química , Peptídeos/química , Estrutura Secundária de Proteína , Motivos de Aminoácidos , Conjuntos de Dados como Assunto , Interações Hidrofóbicas e Hidrofílicas , Ligação Proteica , Modelos Químicos
5.
Proteins ; 92(2): 246-264, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-37837263

RESUMO

α-1 acid glycoprotein (AGP) is one of the most abundant plasma proteins. It fulfills two important functions: immunomodulation, and binding to various drugs and receptors. These different functions are closely associated and modulated via changes in glycosylation and cancer missense mutations. From a structural point of view, glycans alter the local biophysical properties of the protein leading to a diverse ligand-binding spectrum. However, glycans can typically not be observed in the resolved X-ray crystallography structure of AGP due to their high flexibility and microheterogeneity, so limiting our understanding of AGP's conformational dynamics 70 years after its discovery. We here investigate how mutations and glycosylation interfere with AGP's conformational dynamics changing its biophysical behavior, by using molecular dynamics (MD) simulations and sequence-based dynamics predictions. The MD trajectories show that glycosylation decreases the local backbone flexibility of AGP and increases the flexibility of distant regions through allosteric effects. We observe that mutations near the glycosylation site affect glycan's conformational preferences. Thus, we conclude that mutations control glycan dynamics which modulates the protein's backbone flexibility directly affecting its accessibility. These findings may assist in the drug design targeting AGP's glycosylation and mutations in cancer.


Assuntos
Neoplasias , Orosomucoide , Humanos , Glicosilação , Orosomucoide/genética , Orosomucoide/química , Orosomucoide/metabolismo , Conformação Molecular , Polissacarídeos , Neoplasias/genética
6.
Open Res Eur ; 3: 97, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37645489

RESUMO

Background: Data management is fast becoming an essential part of scientific practice, driven by open science and FAIR (findable, accessible, interoperable, and reusable) data sharing requirements. Whilst data management plans (DMPs) are clear to data management experts and data stewards, understandings of their purpose and creation are often obscure to the producers of the data, which in academic environments are often PhD students. Methods: Within the RNAct EU Horizon 2020 ITN project, we engaged the 10 RNAct early-stage researchers (ESRs) in a training project aimed at formulating a DMP. To do so, we used the Data Stewardship Wizard (DSW) framework and modified the existing Life Sciences Knowledge Model into a simplified version aimed at training young scientists, with computational or experimental backgrounds, in core data management principles. We collected feedback from the ESRs during this exercise. Results: Here, we introduce our new life-sciences training DMP template for young scientists. We report and discuss our experiences as principal investigators (PIs) and ESRs during this project and address the typical difficulties that are encountered in developing and understanding a DMP. Conclusions: We found that the DS-wizard can also be an appropriate tool for DMP training, to get terminology and concepts across to researchers. A full training in addition requires an upstream step to present basic DMP concepts and a downstream step to publish a dataset in a (public) repository. Overall, the DS-Wizard tool was essential for our DMP training and we hope our efforts can be used in other projects.

7.
Nat Methods ; 20(9): 1291-1303, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37400558

RESUMO

An unambiguous description of an experiment, and the subsequent biological observation, is vital for accurate data interpretation. Minimum information guidelines define the fundamental complement of data that can support an unambiguous conclusion based on experimental observations. We present the Minimum Information About Disorder Experiments (MIADE) guidelines to define the parameters required for the wider scientific community to understand the findings of an experiment studying the structural properties of intrinsically disordered regions (IDRs). MIADE guidelines provide recommendations for data producers to describe the results of their experiments at source, for curators to annotate experimental data to community resources and for database developers maintaining community resources to disseminate the data. The MIADE guidelines will improve the interpretability of experimental results for data consumers, facilitate direct data submission, simplify data curation, improve data exchange among repositories and standardize the dissemination of the key metadata on an IDR experiment by IDR data sources.


Assuntos
Proteínas Intrinsicamente Desordenadas , Proteínas Intrinsicamente Desordenadas/química , Conformação Proteica
8.
Bioinformatics ; 39(6)2023 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-37252824

RESUMO

MOTIVATION: The generation of parameter files for molecular dynamics (MD) simulations of small molecules that are suitable for force fields commonly applied to proteins and nucleic acids is often challenging. The ACPYPE software and website aid the generation of such parameter files. RESULTS: ACPYPE uses OpenBabel and ANTECHAMBER to generate MD input files in Gromacs, AMBER, CHARMM, and CNS formats. It can now take a SMILES string as input, in addition to the original PDB or mol2 coordinate files, with GAFF2 support and GLYCAM force field conversion added. It can be installed locally via Anaconda, PyPI, and Docker distributions, while the web server at https://bio2byte.be/acpype/ was updated with an API, and provides visualization of results for uploaded molecules as well as a pre-generated set of 3738 drug molecules. AVAILABILITY AND IMPLEMENTATION: The web application is freely available at https://www.bio2byte.be/acpype/ and the open-source code can be found at https://github.com/alanwilter/acpype.


Assuntos
Ácidos Nucleicos , Software , Computadores , Proteínas/metabolismo , Simulação de Dinâmica Molecular
11.
PLoS Comput Biol ; 19(1): e1010859, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36689472

RESUMO

RNA recognition motifs (RRM) are the most prevalent class of RNA binding domains in eucaryotes. Their RNA binding preferences have been investigated for almost two decades, and even though some RRM domains are now very well described, their RNA recognition code has remained elusive. An increasing number of experimental structures of RRM-RNA complexes has become available in recent years. Here, we perform an in-depth computational analysis to derive an RNA recognition code for canonical RRMs. We present and validate a computational scoring method to estimate the binding between an RRM and a single stranded RNA, based on structural data from a carefully curated multiple sequence alignment, which can predict RRM binding RNA sequence motifs based on the RRM protein sequence. Given the importance and prevalence of RRMs in humans and other species, this tool could help design RNA binding motifs with uses in medical or synthetic biology applications, leading towards the de novo design of RRMs with specific RNA recognition.


Assuntos
Motivo de Reconhecimento de RNA , RNA , Humanos , RNA/química , Sequência de Aminoácidos , Alinhamento de Sequência , Motivos de Nucleotídeos/genética , Ligação Proteica , Sítios de Ligação
12.
Nat Commun ; 14(1): 241, 2023 01 16.
Artigo em Inglês | MEDLINE | ID: mdl-36646716

RESUMO

Deep mutational scanning is a powerful approach to investigate a wide variety of research questions including protein function and stability. Here, we perform deep mutational scanning on three essential E. coli proteins (FabZ, LpxC and MurA) involved in cell envelope synthesis using high-throughput CRISPR genome editing, and study the effect of the mutations in their original genomic context. We use more than 17,000 variants of the proteins to interrogate protein function and the importance of individual amino acids in supporting viability. Additionally, we exploit these libraries to study resistance development against antimicrobial compounds that target the selected proteins. Among the three proteins studied, MurA seems to be the superior antimicrobial target due to its low mutational flexibility, which decreases the chance of acquiring resistance-conferring mutations that simultaneously preserve MurA function. Additionally, we rank anti-LpxC lead compounds for further development, guided by the number of resistance-conferring mutations against each compound. Our results show that deep mutational scanning studies can be used to guide drug development, which we hope will contribute towards the development of novel antimicrobial therapies.


Assuntos
Antibacterianos , Proteínas de Escherichia coli , Antibacterianos/farmacologia , Antibacterianos/química , Proteínas de Bactérias/metabolismo , Escherichia coli/metabolismo , Mutação , Proteínas de Escherichia coli/genética , Proteínas de Escherichia coli/farmacologia
13.
Nat Microbiol ; 8(1): 77-90, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36593295

RESUMO

Clustered regularly interspaced short palindromic repeats (CRISPR)-associated Cas9 is an effector protein that targets invading DNA and plays a major role in the prokaryotic adaptive immune system. Although Streptococcus pyogenes CRISPR-Cas9 has been widely studied and repurposed for applications including genome editing, its origin and evolution are poorly understood. Here, we investigate the evolution of Cas9 from resurrected ancient nucleases (anCas) in extinct firmicutes species that last lived 2.6 billion years before the present. We demonstrate that these ancient forms were much more flexible in their guide RNA and protospacer-adjacent motif requirements compared with modern-day Cas9 enzymes. Furthermore, anCas portrays a gradual palaeoenzymatic adaptation from nickase to double-strand break activity, exhibits high levels of activity with both single-stranded DNA and single-stranded RNA targets and is capable of editing activity in human cells. Prediction and characterization of anCas with a resurrected protein approach uncovers an evolutionary trajectory leading to functionally flexible ancient enzymes.


Assuntos
Sistemas CRISPR-Cas , Endonucleases , Firmicutes , Proteína 9 Associada à CRISPR/genética , Proteína 9 Associada à CRISPR/metabolismo , Endonucleases/genética , Endonucleases/metabolismo , Edição de Genes , Firmicutes/enzimologia , Firmicutes/genética , RNA Guia de Sistemas CRISPR-Cas
14.
Proteins ; 91(6): 771-780, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-36629258

RESUMO

Inactive rhodopsin can absorb photons, which induces different structural transitions that finally activate rhodopsin. We have examined the change in spatial configurations and physicochemical factors that result during the transition mechanism from the inactive to the active rhodopsin state via intermediates. During the activation process, many existing atomic contacts are disrupted, and new ones are formed. This is related to the movement of Helix 5, which tilts away from Helix 3 in the intermediate state in lumirhodopsin and moves closer to Helix 3 again in the active state. Similar patterns of changing atomic contacts are observed between Helices 3 and 5 of the adenosine and neurotensin receptors. In addition, residues 220-238 of rhodopsin, which are disordered in the inactive state, fold in the active state before binding to the Gα, where it catalyzes GDP/GTP exchange on the Gα subunit. Finally, molecular dynamics simulations in the membrane environment revealed that the arrestin binding region adopts a more flexible extended conformation upon phosphorylation, likely promoting arrestin binding and inactivation. In summary, our results provide additional structural understanding of specific rhodopsin activation which might be relevant to other Class A G protein-coupled receptor proteins.


Assuntos
Receptores Acoplados a Proteínas G , Rodopsina , Animais , Bovinos , Rodopsina/química , Rodopsina/metabolismo , Conformação Proteica , Receptores Acoplados a Proteínas G/química , Simulação de Dinâmica Molecular , Arrestinas/metabolismo
15.
EMBO J ; 41(23): e111344, 2022 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-36031863

RESUMO

Secretory preproteins of the Sec pathway are targeted post-translationally and cross cellular membranes through translocases. During cytoplasmic transit, mature domains remain non-folded for translocase recognition/translocation. After translocation and signal peptide cleavage, mature domains fold to native states in the bacterial periplasm or traffic further. We sought the structural basis for delayed mature domain folding and how signal peptides regulate it. We compared how evolution diversified a periplasmic peptidyl-prolyl isomerase PpiA mature domain from its structural cytoplasmic PpiB twin. Global and local hydrogen-deuterium exchange mass spectrometry showed that PpiA is a slower folder. We defined at near-residue resolution hierarchical folding initiated by similar foldons in the twins, at different order and rates. PpiA folding is delayed by less hydrophobic native contacts, frustrated residues and a ß-turn in the earliest foldon and by signal peptide-mediated disruption of foldon hierarchy. When selected PpiA residues and/or its signal peptide were grafted onto PpiB, they converted it into a slow folder with enhanced in vivo secretion. These structural adaptations in a secretory protein facilitate trafficking.


Assuntos
Dobramento de Proteína , Sinais Direcionadores de Proteínas , Sinais Direcionadores de Proteínas/genética , Proteínas/metabolismo , Membrana Celular/metabolismo , Interações Hidrofóbicas e Hidrofílicas
16.
Front Mol Biosci ; 9: 959956, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35992270

RESUMO

Traditionally, our understanding of how proteins operate and how evolution shapes them is based on two main data sources: the overall protein fold and the protein amino acid sequence. However, a significant part of the proteome shows highly dynamic and/or structurally ambiguous behavior, which cannot be correctly represented by the traditional fixed set of static coordinates. Representing such protein behaviors remains challenging and necessarily involves a complex interpretation of conformational states, including probabilistic descriptions. Relating protein dynamics and multiple conformations to their function as well as their physiological context (e.g., post-translational modifications and subcellular localization), therefore, remains elusive for much of the proteome, with studies to investigate the effect of protein dynamics relying heavily on computational models. We here investigate the possibility of delineating three classes of protein conformational behavior: order, disorder, and ambiguity. These definitions are explored based on three different datasets, using interpretable machine learning from a set of features, from AlphaFold2 to sequence-based predictions, to understand the overlap and differences between these datasets. This forms the basis for a discussion on the current limitations in describing the behavior of dynamic and ambiguous proteins.

17.
J Proteome Res ; 21(8): 1894-1915, 2022 08 05.
Artigo em Inglês | MEDLINE | ID: mdl-35793420

RESUMO

Protein phosphorylation is the most common reversible post-translational modification of proteins and is key in the regulation of many cellular processes. Due to this importance, phosphorylation is extensively studied, resulting in the availability of a large amount of mass spectrometry-based phospho-proteomics data. Here, we leverage the information in these large-scale phospho-proteomics data sets, as contained in Scop3P, to analyze and characterize proteome-wide protein phosphorylation sites (P-sites). First, we set out to differentiate correctly observed P-sites from false-positive sites using five complementary site properties. We then describe the context of these P-sites in terms of the protein structure, solvent accessibility, structural transitions and disorder, and biophysical properties. We also investigate the relative prevalence of disease-linked mutations on and around P-sites. Moreover, we assess the structural dynamics of P-sites in their phosphorylated and unphosphorylated states. As a result, we show how large-scale reprocessing of available proteomics experiments can enable a more reliable view on proteome-wide P-sites. Furthermore, adding the structural context of proteins around P-sites helps uncover possible conformational switches upon phosphorylation. Moreover, by placing sites in different biophysical contexts, we show the differential preference in protein dynamics at phosphorylated sites when compared to the nonphosphorylated counterparts.


Assuntos
Proteoma , Proteômica , Humanos , Espectrometria de Massas , Fosforilação , Processamento de Proteína Pós-Traducional , Proteoma/metabolismo , Proteômica/métodos
18.
J Mol Biol ; 434(12): 167579, 2022 06 30.
Artigo em Inglês | MEDLINE | ID: mdl-35469832

RESUMO

The role of intrinsically disordered protein regions (IDRs) in cellular processes has become increasingly evident over the last years. These IDRs continue to challenge structural biology experiments because they lack a well-defined conformation, and bioinformatics approaches that accurately delineate disordered protein regions remain essential for their identification and further investigation. Typically, these predictors use the protein amino acid sequence, without taking into account likely sequence-dependent emergent properties, such as protein backbone dynamics. Here we present DisoMine, a method that predicts protein'long disorder' with recurrent neural networks from simple predictions of protein dynamics, secondary structure and early folding. The tool is fast and requires only a single sequence, making it applicable for large-scale screening, including poorly studied and orphan proteins. DisoMine is a top performer in its category and compares well to disorder prediction approaches using evolutionary information. DisoMine is freely available through an interactive webserver at https://bio2byte.be/disomine/.


Assuntos
Proteínas Intrinsicamente Desordenadas , Redes Neurais de Computação , Análise de Sequência de Proteína , Software , Sequência de Aminoácidos , Biologia Computacional/métodos , Proteínas Intrinsicamente Desordenadas/química , Estrutura Secundária de Proteína , Análise de Sequência de Proteína/métodos
19.
Nat Commun ; 12(1): 6414, 2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34741024

RESUMO

While transcriptome- and proteome-wide technologies to assess processes in protein biogenesis are now widely available, we still lack global approaches to assay post-ribosomal biogenesis events, in particular those occurring in the eukaryotic secretory system. We here develop a method, SECRiFY, to simultaneously assess the secretability of >105 protein fragments by two yeast species, S. cerevisiae and P. pastoris, using custom fragment libraries, surface display and a sequencing-based readout. Screening human proteome fragments with a median size of 50-100 amino acids, we generate datasets that enable datamining into protein features underlying secretability, revealing a striking role for intrinsic disorder and chain flexibility. The SECRiFY methodology generates sufficient amounts of annotated data for advanced machine learning methods to deduce secretability patterns. The finding that secretability is indeed a learnable feature of protein sequences provides a solid base for application-focused studies.


Assuntos
Saccharomyces cerevisiae/metabolismo , Humanos , Proteoma/genética , Proteoma/fisiologia , Transcriptoma/genética , Transcriptoma/fisiologia
20.
Comput Struct Biotechnol J ; 19: 4919-4930, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34527196

RESUMO

Protein folding and function are closely connected, but the exact mechanisms by which proteins fold remain elusive. Early folding residues (EFRs) are amino acids within a particular protein that induce the very first stages of the folding process. High-resolution EFR data are only available for few proteins, which has previously enabled the training of a protein sequence-based machine learning 'black box' predictor (EFoldMine). Such a black box approach does not allow a direct extraction of the 'early folding rules' embedded in the protein sequence, whilst such interpretation is essential to improve our understanding of how the folding process works. We here apply and investigate a novel 'grey box' approach to the prediction of EFRs from protein sequence to gain mechanistic residue-level insights into the sequence determinants of EFRs in proteins. We interpret the rule set for three datasets, a default set comprised of natural proteins, a scrambled set comprised of the scrambled default set sequences, and a set of de novo designed proteins. Finally, we relate these data to the secondary structure adopted in the folded protein and provide all information online via http://xefoldmine.bio2byte.be/, as a resource to help understand and steer early protein folding.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...